Impact of Macroeconomics Elements on the Crime Rate in the USA
1. Introduction
FBI image
1.1 Overview and motivation
We have chosen the topic of crime because it has always been topical,
as we can still see today because of the increasing number of crimes
against humanity, hate crimes and violent acts around the world.
Multiple wars in Europe and the Middle East; dubious governments in
Africa; social networks that have become a digital court or a channel
for harassment; or religious institutions like the Church that has been
qualified as one of the biggest networks of sexual assault and
pedophilia are just a few impressions that the situation is
deteriorating year after year. But has this always been the case?
The U.S. is the world’s largest economic and military power but is
still aggressively criticized for its various laws governing the right
to bear arms, racism, poverty, and failing health and justice systems.
The United States is no exception to the rule of increasing crime rates.
Some cities, such as Houston in the state of Texas, is considered one of
the most dangerous cities on American soil, where crime is out of
control, where murders and trafficking of all kinds are almost freely
carried out; a worrying case that reveals the particular situation of
the United States regarding public safety, as stated in the article “Is
Crime up or down ? In Houston, concerns are hard to allay” from the
news agency “The Associated Press”.
As all three of us have a real interest in this country, the choice
of making it the focus of our study seemed obvious. The United States is
a federal state, which means that different autonomous entities run the
country at their own level, so the states have their own constitution
and laws. Because of its great political, cultural, and social
diversity, the United States is the perfect choice for our study, which
will be all the more accomplished thanks to the wealth of data.
Also, as explained in the article “Stories about crime are rife with misinformation and racism, critics say”, opinions are strongly divided regarding the information relayed by the media about crime. Some say that public opinion is highly manipulated by the government controlling the media, others claim the objectivity of the American media. For this reason, we would like to conduct our own study and better understand crime in the United States.
1.3 Research question
The purpose of this analysis is to estimate the impact of several macroeconomic variables on the crime rate in the United States. For this project we consider only violent crime, namely: homicide, rape, aggravated assault, and robbery. In addition, we consider the following macroeconomic variables: population, poverty rate, foreign-born persons residing in the United States, Police and Sheriff’s Patrol Officers, number of federal firearms licensees, income inequality, and unemployment rate. These variables are reported by state for the year 2021 (2020 for data that have not been updated for the year 2021).
We consider the following five questions as guidelines for our study:
Which states have high crime rates? We want to rank the states according to their crime rates, to identify potential similarities and/or differences, and to create groups that are representative of the entire country.
How have crime rates changed over time? Find out if states with high crime rates have always been high crime states, or if they have become high crime states (if so, understand why).
What are the common variables influencing crime rates? We will identify the variables that most impact the crime rate in each group by listing the most common variables potentially responsible for violent crime.
What impact do these different variables have on the crime rate? We would like to estimate the weight of each variable identified in question 2 on the crime rate.
How can we optimize the impact of these variables on crime? We would like to identify the potentially versatile variables, in order to propose feasible solutions to decrease the crime rate.
1.4 Data
1.4.1 Where you can get it
Source: https://crime-data-explorer.fr.cloud.gov/pages/explorer/crime/crime-trend Our first and largest database is from the Federal Bureau of Investigation’s Crime Data Explorer. This data provides us with information on the rate of violent crime, rape, aggravated assault, homicide, and robbery over a 35-year period (1985 to 2020), which allowed us to create two data frames:
1st Data Frame:
| Variable | Meaning |
| Series | Information about where the crime(s) was/were commited |
| Rate | Violent crime rates by location by year |
| 1985-2020 | Year considered for rate calculation |
2nd Data Frame:
| Variable | Meaning |
| Series | Information about where the crime(s) was/were commited |
| Type_of_crime | Type of cime commited |
| Rated | Rate calculated according to the type of crime, the location, and the year |
| 1985-2020 | Year considered for rate calculation |
The FBI has estimated that crime statistics are not based on data collected by 100% of law enforcement agencies. In addition, some crimes are not reported to the police, remain unsolved or unsolved. Therefore, we cannot say that the data fully reflect reality, but for the sake of our study, we will assume that it does.
Regarding the rape data, the data yields two different rates: “Legacy Rape” and “Revised Rape.” In 2013, the FBI began collecting rape data under a revised definition and removed the word “forcible” from the offense name. We then made the decision to keep only the “Revised Rape” data from 2013, since it ultimately encompasses both definitions of rape.
To answer one of our research questions, we need the violent crime rate by state for the year 2021, which we found at the following site. We then noticed that the rates are the same for all states compared to the 2020 data. We can then suggest that Covid has had little or no impact on the violent crime rate, as the article “Federal surveys show no increase in U.S. violent crime rate since start of the pandemic” shows. We then consider the same rates for the year 2020 and 2021.
Source : https://worldpopulationreview.com/states
This database has 9 variables but only 4 are of interest to us, this one relates the population in the United States by state in 2021 and 2022.
| Variable | Meaning |
| State | State where the population was measured |
| Pop | Population by geographic area in 2022 |
| Pop2021 | Population by geographic area in 2021 |
| Growth | Population growth rate from 2021 to 2022 |
Source : https://www.americanprogress.org/data-view/poverty-data/poverty-data-map-tool/
This database provides the poverty rate, unemployment rate, and income inequality by state for the year 2021. Data collection from the U.S. Census Bureau was interrupted due to the Covid crisis, so the published data are experimental data that we will assume are real for our study.
| Variable | Meaning |
| State | State considered for the calculation of the rate |
| Official poverty rate | Poverty rate by state (in %) |
| Unemployment rate | Unemployment rate by state (in %) |
| Income inequality | Income inequality by state (in %) |
Source : https://www.atf.gov/firearms/docs/report/2021-firearms-commerce-report/download
This database is taken from the Bureau of Alcohol, Tobacco, Firearms and Explosives’ annual report published in 2021 which reports data for the year 2020 (page 22). We do not have data for the year 2021 since the report will not be published until December 2022, but we will consider the same data for the year 2021. We do not have data for guns bought and sold on the black market, so we will only consider legal licenses.
| Variable | Meaning |
| State | State considered for the calculation of the number of the firearms licensees |
| FFI. Population | Number of Federal Firearms Licensees |
Source : https://data.bls.gov/oes/#/home
We found this database on the US Bureau of Labor Statistics website with the choices “One occupation for multiple geographic areas” à “Police and Sheriff’s Patrol Officers”à “State”à “All states in this list”. It gives the number of police and sheriff’s patrol officers by state in May 2021. There are 18 variables in this database, but we will focus on 2 of them.
| Variable | Meaning |
| States | State considered for the calculation of the number of employees |
| Employement | Number of employees by state |
Source : https://www.pewresearch.org/hispanic/2020/08/20/facts-on-u-s-immigrants-current-data/
This database (“Nativity of U.S immigrants”) gives us the number of people residing in the United States who were born abroad in 2018. It has 14 variables but only 2 are important to us. Obviously, we will not consider the illegal arrivals on the territory.
| Variable | Meaning |
| States | Information on where the number of immigrants |
| Foreign born | Number of foreign-born US residents |
2. Exploratory Data Analysis
2.1 Type of crimes dataset
2.1.1 What is the evolution of the crime rate in the USA ?
The first striking thing we notice on this graph is that we have two distinct trends, from 1985 to 1992, the violent crime rate increases, while overall from 1992 to 2020, the rate decreases.
In fact, since World War II, crime increased enormously until the early 1990s, and then declined sharply, due in part to the strong political commitment to law enforcement by President Bill Clinton (Democrat Party), by hiring more police and providing more funding to crime control institutions.
Increased immigration, higher wages, changing demographics in the country, and abortion rights also explain this decrease in crime, as explained on this web page.
2.1.2 Which states have the highest crimes rate ?
The states chosen here have the highest crime rates in 2020. We found
it interesting to see if this has always been the case over time. In
this graph, we can see that between 1990 & 1995, the violent crime
rate tends to increase. Then, from 1995 to 2000, all 10 states
experienced a decrease in the crime rate. From 2013, the rate increases
again, especially due to the revision of the rape law.
Focusing on Alaska and Tennessee, which have the first and third
highest crime rates in 2020, respectively, we can see from the trends
curves that in the 1990s these two states were average compared to the
others.
Conversely, New Mexico, which is in second place in 2020,
had a very high crime rate in the 1990s. For the latter, this can
potentially be explained by the Latin migratory wave (from Mexico) that
the USA experienced from the years 70-80. It is also interesting to see
that states like Louisiana or South Carolina, which were at the top of
the list in the 90s, have significantly reduced their crime rate.
2.1.3 Which states have the lowest crimes rate ?
The 10 states chosen here are those with the lowest crime rates in
2020. We thought it would be interesting to analyze their evolution. New
Jersey and Connecticut, are the states that experienced the biggest drop
in violent crime rates between the 1990s and 2000. All others remained
more or less the same.
This can be explained by changes in laws, the renewal of the police
force, and a different approach to crime. For example, New Jersey had to
reform its entire police force because of corruption cases. The new
forces in place are trained primarily in de-escalation and dialogue
during interventions.
In the case of Connecticut, for instance, it is mainly the reclassification of crimes as felonies and the granting of greater discretionary power (decisions made completely independently of the facts) to judges that has led to a significant reduction in the crime rate.
2.1.4 Which states have the highest crimes rate according to the 2015-2020 period ?
The barplot above transcribes the average violent crime by
state between 2015-2020. We can clearly see that even averaging across
2015-2020, the three states with the highest crime rates remain the same
as before, Alaska, New Mexico and Tennessee.
Here is a representation of an interactive map that allows
to visualize the same data as on the previous barplot. Thanks to the
color code, we can see which states have the highest average crime rate
between 2015 and 2020, as well as their geographical location. It is
obvious on this interactive map that the states with the highest crime
rate are only a minority of the country (only 9/50).
2.1.5 What are the most frequent crime ?
From the data collected, it is clear that between 2015-2020 the most common violent crime committed is Aggravated assault. Followed by robbery, rape and homicide.
2.1.6 What are their evolution over time ?
In the four graphs that represent each type of violent crime,
we can see that the trend for robbery, aggravated assault and homicide
is the same. These last ones had a decrease of more than half between
1990 and 2010. This strong decrease can be explained by the following
theories: improvement of police strategies and more generally of the
police force, authorization of more and more abortion which, according
to theories, would have allowed to reduce the birth of children from
poor and single teenage mothers. Roughly speaking, it is assumed that
these births were most likely to produce delinquents.
Another theory points to the Obama presidency. Indeed, it seems that
the fact that the new president is a man of color would result in a
decrease in crimes committed by people of color.
However, there was an increase around 2013-2014 in the rate of
homicides and aggravated assault. This is explained, depending on the
state, by a return to gang warfare, drugs and gun sales.
Concerning the evolution of the rape rate, we remind you that the United States have proceeded to a change in the law. Previously, rape only concerned the forced penetration of a woman’s vagina by a man’s penis. Today, the definition is as follows : The penetration, no matter how slight, of the vagina or anus with any body part or object, or oral penetration by a sex organ of another person, without the consent of the victim. This explains the strong increase between 2013 and today.
2.2 Formation of State clusters
2.2.1 Find the ideal number of clusters
In order to compare states more efficiently, we decided to do
k-means clustering based on four dimensions - Robbery, Homicides,
Aggravated Assaults and Rape. We first scaled the data and then added
the state names as row names.
To determine the ideal number of
clusters, we used the adjusted R-squared method using
Optimal_Clusters_KMeans.
The \(R^2_a\) adjusted R-squared increases as the
clusters increase, which means that the model is better explained.
Here we see that from 4 clusters onwards, the adjusted R-squared is over
80%. Due to its higher data than the other states, Alaska appears in a
single cluster, so we decided, in order to have the most relevant model
possible, to have 5 clusters that we will represent using
fviz_cluster.
2.2.2 Group the States according to their crime behavior
In the PCA plot of the variables, we see that the variables
Rape_Scaled and Aggravated_Assaults_Scaled are correlated, as well as
the variables Homicides_Scaled and Robbery_Scaled.
In the graph
representing the five clusters, dimension 1 is an average between the
correlated variables Homicides_Scaled and Robbery_Scaled; and dimension
2 is an average between the correlated variables Rape_Scaled and
Aggravated_Assaults_Scaled.
km.res[["centers"]]## Robbery_scaled Homicides_scaled Aggravated_assault_scaled Rape_scaled
## 1 0.7881724 1.4176379 1.48108088 0.4370596
## 2 1.1558337 1.0691436 2.75152733 4.5779411
## 3 0.8194608 0.4437583 -0.03728383 -0.5420362
## 4 -0.6686191 -0.8708740 -0.88735816 -0.3830740
## 5 -0.7665022 -0.3274591 0.21047178 0.7121579
Through the analysis of the scaled data, we assigned certain
characteristics to the clusters:
- Cluster 1 is characterized by a
Homicide rate higher than most (highest scaled average).
- Cluster
2 is characterized by a Robbery rate higher than most (highest scaled
average), an Aggravated Assault rate higher than most (highest scaled
average), a Rape rate higher than most (highest scaled average) .
-
Cluster 3 is characterized by a Rape rate lower than most (lowest scaled
average).
- Cluster 4 is characterized by a Robbery rate lower than
most (second lowest scaled average),a Homicide rate lower than most
(lowest scaled average), an Aggravated Assault rate lower than most
(lowest scaled average) .
- Cluster 5 is characterized by a Robbery
rate lower than most (lowest scaled average).
2.3 Macroeconomic variables Dataset
2.3.1 Macroeconomic variables behavior depending on clusters
We would like to see what the characteristics of each of the five
clusters are based on the mean of each macro-variable. We created six
barplots to facilitate the analysis :
Thanks to the first barplot, we can see that cluster number 3 stands out
from the other four, since the average number of foreigners in the 14
states in the cluster is 12271 per 100000. Cluster 5 has a low average
of 5255.
Concerning the average number of firearms licenses, cluster 2, which
contains the state of Alaska, is much higher than the others with an
average of 113 per 100,000 inhabitants. Cluster 3 has the lowest average
of 34.
The unemployment rate averages of the 5 clusters are quite similar,
nevertheless, we can see that cluster 2 (Alaska) stands out to reach a
rate of 6400 per 100000. In addition, cluster 5 has the lowest average
of 4000.
The police officer averages are fairly similar, but we can see that
Cluster 1 has a slightly higher average than the others. Clusters 2 and
4 have almost the same average.
Regarding the poverty rate per 100,000 inhabitants, cluster 1 clearly
has the highest average of 15675. In addition, clusters 2 and 4 again
have almost the same average, as do clusters 3 and 5.
Clusters 1 and 3 have the highest income inequality averages, cluster 2 has the lowest. Clusters 4 and 5 have similar averages.
2.3.2 Macroeconomic variables Scatterplots
In this section, we have mainly plotted graphs showing distribution of the crime rate depending on each macroeconomic variable.
Distribution of Foreign_Born
It is difficult to tell what the trend of this graph is only
with the scatterplots, nevertheless, by adding the regression line, we
see that the relationship between the two variables is negative. This
result is surprising because we would have thought that the immigration
rate would have increased delinquency. Clusters 1 and 3 show a positive
trend.
Distribution of Weapons_licences
The scatterplot shows a predominantly positive relationship between the two variables, which is confirmed by the regression line. This means that the more firearms licenses are issued, the higher the crime rate. The coefficient is not very high, however, it would be interesting to see what impact this variable has on the different types of crime.
Distribution of Unemployement rate
The regression line is increasing, so the unemployment rate
positively influences the crime rate, especially for cluster 1 which
show a clearly positive trend scatterplot.
Distribution of Officers
Like the previous graph, the number of law enforcement officers seems to have a positive impact on the crime rate, which is quite surprising, intuitively, one would think that the more law enforcement there is the less crime there is. As for the points in each cluster, they are scattered compared to the previous graphs.
Distribution of Poverty rate
This graph shows an increasing line and a high directing
coefficient. It can be argued that the poverty rate has a strong
positive influence on the crime rate.
Distribution of Income Inequality
The regression line is increasing but the directing coefficient is
still quite low. The effect of income inequality on each of the four
types of crime would need to be studied to clarify this
relationship.
2.4 Assumptions
2.4.1 Assumption on what impact the Robbery rate
What we know from the above analysis: Cluster 4 has a low average
robbery rate, low unemployment rate. Cluster 5 also has a low average
robbery rate and a low unemployment rate. Both clusters also have almost
the same average income inequality. Cluster 1 has a high average robbery
rate and a high unemployment rate.
Therefore, it can be hypothesized that the robbery rate in a state is influenced by the unemployment rate and by income inequality.
2.4.2 Assumption on what impact the Homicides rate
What we know from the above analysis: Cluster 1 has the highest
average homicide rate, low immigration rate, high poverty rate, and high
law enforcement membership rate. Cluster 4 has a low average homicide
rate, a high immigration rate, a low poverty rate, and a low law
enforcement rate.
Therefore, it can be hypothesized that the homicide rate in a state is influenced by the immigration rate, poverty rate, and law enforcement membership rate.
2.4.3 Assumption on what impact the Aggravated Assault rate
What we know from the analysis above: Cluster 2 has high average
aggravated assaults, high rates of firearms licenses issued, and high
unemployment. Cluster 4 has a low average of aggravated assaults, and
low rates of firearms licenses issued, and unemployment.
Thus, it can be hypothesized that the rate of aggravated assaults in a state is influenced by the rate of firearm licenses issued and the unemployment rate.
2.4.4 Assumption on what impact the Rape rate
What we know from the analysis above: Cluster 2 has a high average
rape rate, high rate of firearms licenses issued, low income inequality.
Cluster 3 has a low rate of rape, a low rate of firearms licenses
issued, and a high rate of income inequality.
Thus, it can be hypothesized that the rate of rape in a state is influenced by the rate of firearms licenses issued and the rate of income inequality.
3. Modeling
First, we decide to remove Alaska as an outlier since this state
clearly stands out from the others due to its particularly high violent
crime rates as we saw in the exploratory data analysis.
3.1 Determine the robbery rate through the macro variables
We started by calculating the correlation matrix of this model:
corr_matrixRobb| Robbery | Foreign_Born21 | Weapons_licences | Unemployement_rate | Officers | Income_Inequal | |
|---|---|---|---|---|---|---|
| Robbery | 1.000 | 0.525 | -0.579 | 0.602 | 0.141 | 0.460 |
| Foreign_Born21 | 0.525 | 1.000 | -0.582 | 0.636 | -0.036 | 0.481 |
| Weapons_licences | -0.579 | -0.582 | 1.000 | -0.429 | -0.035 | -0.420 |
| Unemployement_rate | 0.602 | 0.636 | -0.429 | 1.000 | 0.103 | 0.707 |
| Officers | 0.141 | -0.036 | -0.035 | 0.103 | 1.000 | 0.454 |
| Income_Inequal | 0.460 | 0.481 | -0.420 | 0.707 | 0.454 | 1.000 |
We decided to calculate the following regression, which considers all the macro-variables presented above, to determine the robbery rate :
\(Robbery =\beta_0 + \beta_1* ForeignBorn21 + \beta_2 * WeaponsLicences + \beta_3 * UnemployementRate + \\\beta_4 * Officers + \beta_5 * PovRate +\beta_6 * IncomeInequal\)
We find an \(R^2\) of 0.5244, which means that 52% of the variation in the robbery rate is explained by the linear relationship with the macro variables.
By applying backward selection based on AIC, we are told that the Employees variable is rejected. Our final regression is then :
\(Robbery =\beta_0 + \beta_1* ForeignBorn21 + \beta_2 * WeaponsLicences + \beta_3 * UnemployementRate + \\ \beta_4 * PovRate +\beta_5 * IncomeInequal\)
summary(regRobbery)##
## Call:
## lm(formula = Robbery ~ Foreign_Born21 + Weapons_licences + Unemployement_rate +
## Pov_rate + Income_Inequal, data = Bigdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.093 -11.215 2.608 10.705 53.207
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.0311129 25.9726415 1.041 0.3038
## Foreign_Born21 0.0014237 0.0008696 1.637 0.1089
## Weapons_licences -0.3090258 0.1292561 -2.391 0.0213 *
## Unemployement_rate 0.0081597 0.0037536 2.174 0.0353 *
## Pov_rate 0.0036082 0.0016160 2.233 0.0308 *
## Income_Inequal -0.0032709 0.0022878 -1.430 0.1600
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.39 on 43 degrees of freedom
## Multiple R-squared: 0.5423, Adjusted R-squared: 0.4891
## F-statistic: 10.19 on 5 and 43 DF, p-value: 1.738e-06
We see that of the five variables considered, three of them are statistically significant. We therefore assume that Weapons licences, Unemployment rate and Poverty rate have a relevant impact on the determination of the robbery rate in the United States. Nevertheless, removing the variables that are not statistically different from 0 could change or cancel the impact of the others on the dependent variable. It is therefore important to keep all the variables that the backward induction suggested based on the AIC, as proven before.
We check if there is a multi-collinearity issue :
| Variables | Tolerance | VIF |
|---|---|---|
| Foreign_Born21 | 0.286 | 3.494 |
| Weapons_licences | 0.606 | 1.649 |
| Unemployement_rate | 0.376 | 2.659 |
| Pov_rate | 0.424 | 2.361 |
| Income_Inequal | 0.318 | 3.148 |
We have no VIF greater than 5, so we assume that
multi-collinearity is not a concern.
| RMSE | MAE | MASE |
|---|---|---|
| 18.16062 | 14.38561 | 0.6619215 |
We see that this model has an MAE of 14, which means that on
average, the model has an error of 14. In terms of robberies, our model
does not explain 1400000 of the robberies in the US.
We see that
the points on the QQ-Plot do not fall on the curve but are not scattered
far from it. There are therefore anomalies but the model remains
relevant since there is a relationship between the independent variables
and the dependent variable.
Our first hypothesis was that the robbery rate depended on the
unemployment rate and income inequality, our model confirmed this
hypothesis, however, the immigration rate, the number of firearms
licenses issued and the poverty rate also influence the robbery rate.
3.2 Determine the Homicides rate through the macro variables
We started by calculating the correlation matrix of this model:
corr_matrixHomi| Homicides | Foreign_Born21 | Weapons_licences | Unemployement_rate | Officers | Income_Inequal | |
|---|---|---|---|---|---|---|
| Homicides | 1.000 | -0.214 | -0.203 | 0.156 | 0.430 | 0.350 |
| Foreign_Born21 | -0.214 | 1.000 | -0.582 | 0.636 | -0.036 | 0.481 |
| Weapons_licences | -0.203 | -0.582 | 1.000 | -0.429 | -0.035 | -0.420 |
| Unemployement_rate | 0.156 | 0.636 | -0.429 | 1.000 | 0.103 | 0.707 |
| Officers | 0.430 | -0.036 | -0.035 | 0.103 | 1.000 | 0.454 |
| Income_Inequal | 0.350 | 0.481 | -0.420 | 0.707 | 0.454 | 1.000 |
We decided to calculate the following regression, which considers all the macro-variables presented above, to determine the Homicides rate :
\(Homicides =\beta_0 + \beta_1* ForeignBorn21 + \beta_2 * WeaponsLicences + \beta_3 * UnemployementRate + \\ \beta_4 * Officers + \beta_5 * PovRate +\beta_6 * IncomeInequal\)
We find an \(R^2\) of 0.5811, which means that 58% of the variation in the theft rate is explained by the linear relationship with the macro variables.
By applying backward selection based on AIC, we are told that the Unemployement & Income Inequality variables are rejected. Our final regression is then :
\(Homicides =\beta_0 + \beta_1* ForeignBorn21 + \beta_2 * WeaponsLicences + \beta_3 * Officers + \\ \beta_4 * PovRate\)
summary(regHomicides)##
## Call:
## lm(formula = Homicides ~ Foreign_Born21 + Weapons_licences +
## Officers + Pov_rate, data = Bigdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7030 -1.3072 -0.1916 1.3940 5.2785
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.7163060 2.4968823 -1.088 0.2826
## Foreign_Born21 -0.0001262 0.0000647 -1.951 0.0575 .
## Weapons_licences -0.0320626 0.0136464 -2.350 0.0233 *
## Officers 0.0174085 0.0098143 1.774 0.0830 .
## Pov_rate 0.0006448 0.0001299 4.962 1.09e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.048 on 44 degrees of freedom
## Multiple R-squared: 0.5923, Adjusted R-squared: 0.5552
## F-statistic: 15.98 on 4 and 44 DF, p-value: 3.763e-08
We see that of the four variables included in the regression, weapons
licenses and the poverty rate are statistically significant at a
different level, meaning that varying them would have a large impact on
the homicides rate. We notice that the intercept is negative for this
regression and is equal to -2.7. It may seem worrying to have a negative
intercept, but this is not a problem. It can be explained by the fact
that our macro variables are expressed differently (some in tens,
hundreds, or thousands) but are all reduced to 100k, the intercept will
“correct” these differences.
We check if there is a
multi-collinearity issue
| Variables | Tolerance | VIF |
|---|---|---|
| Foreign_Born21 | 0.577 | 1.733 |
| Weapons_licences | 0.607 | 1.647 |
| Officers | 0.835 | 1.198 |
| Pov_rate | 0.731 | 1.367 |
We have no VIF greater than 5, so we assume that
multi-collinearity is not a concern.
| RMSE | MAE | MASE |
|---|---|---|
| 1.940874 | 1.562546 | 0.6253413 |
We see that the points on the QQ-Plot do not fall on the curve but
are not scattered far from it. There are therefore anomalies but the
model remains relevant since there is a relationship between the
independent variables and the dependent variable.
Our second hypothesis said that the homicide rate depended on the immigration rate, the poverty rate, and the number of firearms licenses issued. Our regression model confirmed this hypothesis, but the rate of law enforcement officers also impacts the homicide rate.
3.3 Determine the Aggravated Assault rate through the macro variables
We started by calculating the correlation matrix of this model:
corr_matrixAssault| Aggravated_assault | Foreign_Born21 | Weapons_licences | Unemployement_rate | Officers | Income_Inequal | |
|---|---|---|---|---|---|---|
| Aggravated_assault | 1.000 | -0.181 | -0.004 | 0.045 | 0.292 | 0.202 |
| Foreign_Born21 | -0.181 | 1.000 | -0.582 | 0.636 | -0.036 | 0.481 |
| Weapons_licences | -0.004 | -0.582 | 1.000 | -0.429 | -0.035 | -0.420 |
| Unemployement_rate | 0.045 | 0.636 | -0.429 | 1.000 | 0.103 | 0.707 |
| Officers | 0.292 | -0.036 | -0.035 | 0.103 | 1.000 | 0.454 |
| Income_Inequal | 0.202 | 0.481 | -0.420 | 0.707 | 0.454 | 1.000 |
We decided to calculate the following regression, which considers all the macro-variables presented above, to determine the Aggravated_assault rate :
\(Assault =\beta_0 + \beta_1* ForeignBorn21 + \beta_2 * WeaponsLicences + \beta_3 * UnemployementRate + \\\beta_4 * Officers + \beta_5 * PovRate +\beta_6 * IncomeInequal\)
We find an \(R^2\) of 0.3373, which means that 34% of the variation in the theft rate is explained by the linear relationship with the macro variables.
By applying backward selection based on AIC, we are told that all the variables are rejected except Poverty rate. Our final regression is then :
\(Assault =\beta_0 + \ \beta_1 * PovRate\)
summary(regAssault)##
## Call:
## lm(formula = Aggravated_assault ~ Pov_rate, data = Bigdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -262.716 -64.328 -4.949 34.947 244.526
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -71.908152 68.558458 -1.049 0.3
## Pov_rate 0.027249 0.005334 5.109 5.81e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 98.31 on 47 degrees of freedom
## Multiple R-squared: 0.3571, Adjusted R-squared: 0.3434
## F-statistic: 26.1 on 1 and 47 DF, p-value: 5.815e-06
| RMSE | MAE | MASE |
|---|---|---|
| 96.28532 | 70.70843 | 0.7543343 |
We note that five clearly identifiable points stand out on the red
curve. Again, some states have lower than average rates of aggravated
assault (Maine, New Hampshire, Connecticut), others higher (New Mexico,
Tennessee). The rest of the points are very close to the curve.
Our third hypothesis was that the rate of aggravated assaults depended on the number of firearms licenses issued and the unemployment rate. This was disproved by our model, which assumes that the homicide rate is only affected by the poverty rate.
3.4 Determine the Rape rate through the macro variables
We started by calculating the correlation matrix of this model:
corr_matrixRape| Rape_net | Foreign_Born21 | Weapons_licences | Unemployement_rate | Officers | Income_Inequal | |
|---|---|---|---|---|---|---|
| Rape_net | 1.000 | -0.370 | 0.468 | -0.263 | -0.056 | -0.352 |
| Foreign_Born21 | -0.370 | 1.000 | -0.582 | 0.636 | -0.036 | 0.481 |
| Weapons_licences | 0.468 | -0.582 | 1.000 | -0.429 | -0.035 | -0.420 |
| Unemployement_rate | -0.263 | 0.636 | -0.429 | 1.000 | 0.103 | 0.707 |
| Officers | -0.056 | -0.036 | -0.035 | 0.103 | 1.000 | 0.454 |
| Income_Inequal | -0.352 | 0.481 | -0.420 | 0.707 | 0.454 | 1.000 |
We decided to calculate the following regression, which considers all the macro-variables presented above, to determine the Rape rate :
\(Rape =\beta_0 + \beta_1* ForeignBorn21 + \beta_2 * WeaponsLicences + \beta_3 * UnemployementRate + \\ \beta_4 * Officers + \beta_5 * PovRate +\beta_6 * IncomeInequal\)
We find an \(R^2\) of 0.4257, which means that 42% of the variation in the theft rate is explained by the linear relationship with the macro variables.
By applying backward selection based on AIC, we are told that ForeignBorn & Officers variables are rejected. Our final regression is then :
\(Rape =\beta_0 + \beta_1 * WeaponsLicences + \beta_2 * UnemployementRate + \\ \beta_3 * PovRate +\beta_4 * IncomeInequal\)
summary(regRape)##
## Call:
## lm(formula = Rape_net ~ Weapons_licences + Unemployement_rate +
## Pov_rate + Income_Inequal, data = Bigdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.2484 -7.2032 -0.8501 3.7470 24.2894
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.3576978 13.5252639 3.649 0.000694 ***
## Weapons_licences 0.1571027 0.0598584 2.625 0.011884 *
## Unemployement_rate 0.0017767 0.0017675 1.005 0.320311
## Pov_rate 0.0021888 0.0006554 3.340 0.001717 **
## Income_Inequal -0.0032506 0.0011215 -2.898 0.005828 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.16 on 44 degrees of freedom
## Multiple R-squared: 0.4023, Adjusted R-squared: 0.3479
## F-statistic: 7.403 on 4 and 44 DF, p-value: 0.0001193
Regarding the final regression of the rape rate, three of the four
variables considered are statistically significant at different levels:
weapons licensees, poverty rate and income inequality, so they have an
influence on the rape rate.
We check if there is a multi-collinearity issue :
| Variables | Tolerance | VIF |
|---|---|---|
| Weapons_licences | 0.777 | 1.287 |
| Unemployement_rate | 0.466 | 2.144 |
| Pov_rate | 0.708 | 1.413 |
| Income_Inequal | 0.363 | 2.752 |
We have no VIF greater than 5, so we assume that
multi-collinearity is not a concern.
| RMSE | MAE | MASE |
|---|---|---|
| 9.632063 | 7.460803 | 0.7593315 |
We can see that the QQ-plot has some anomalies, related to states
with higher than average rape rates, like Arkansas, South Dakota,
Colorado, Michigan… The other points are on the curve or very close.
Our fourth hypothesis said that the rape rate depended on the number of firearms licenses issued and the rate of income inequality. Our model confirmed this hypothesis, but the unemployment rate and the poverty rate also affect the rape rate.
3.5 Regression
Robbery Scatter Plot Matrix
Homicides Scatter Plot Matrix
Aggravated_assault Scatter Plot Matrix
Rape Scatter Plot Matrix
4. Conclusion
The crime rate has varied widely between 1985 and 2020. Its increase
was due to the geopolitical situation of the United States, which was
going through a period of crisis, and its decrease was often related to
demographic and economic changes, such as increased immigration, wages,
and police manpower. However, we can note that the rate has been fairly
stable since 2016. The COVID crisis significantly increased the homicide
rate, for example, due to the quarantine and numerous protests as shown
in this study.
As seen from the bar plot in 2.1.5, the most common crime committed
in the U.S. is aggravated assault. Our regression model clearly
expresses the fact that the poverty rate massively impacts the number of
aggravated assaults. Poverty also influences the other three types of
crime, at different levels. One could then suggest solutions to lower
this rate in order to lower the crime rate.
One of the particularities of the United States is that the health care system is failing and unequal, American citizens can only afford health care if they can afford it, otherwise they go into debt, which increases the rate of poverty, and therefore more and more crimes committed. Revising the health care system to make it fairer would be a great step forward for the country.
In addition, the American work life is based on the ideology of meritocracy. This leaves many citizens unemployed or in low-paying jobs, which contributes to the rising poverty rate. The creation of new jobs should massively reduce the poverty rate and the unemployment rate.
The problem of hate crimes and racism is also very present in some states, as we saw in 2020 with the “Black Lives Matter” movement following the murder of a citizen of color. The image of law enforcement people took a hit, they were not taken seriously or respected by citizens. Thousands of resignations were recorded in the sector. Awareness programs should be incorporated into schools to prevent hate crimes and to teach young people about their rights as citizens. It would also be important to remobilize law enforcement to lower the homicide rate.
Ensure equity from birth, regardless of social class and ethnicity. It would be necessary to allow citizens to access a proper level of education without going into debt.
Our study does not, of course, take into account unsolved or
unrecorded crime and black market activity in the United States, such as
undeclared employment, illegal gun carrying, and illegal immigration.
Unfortunately, it is impossible to estimate these factors and
incorporate them into the model. Nevertheless, they certainly have an
impact on the rates we have estimated.
In addition, some states have much higher than average crime rates,
such as Alaska, New Mexico, Tennessee, and Arkansas. It might make sense
to impose a limit on the crime rate that should not be exceeded by
state, otherwise the governor should be forced to lower the rate by any
way necessary.
As a result of our exploratory analysis of the data, we noticed that the party (Democrat or Republican) in power at a given time had a large impact on the crime rate. While crime was peaking, when Democrat Bill Clinton took office in 1993, the crime rate dropped, thanks to the introduction of new laws and increased police staffing (as explained in section 2.1.1). It would be extremely interesting to take this new variable into account to see its impact on different types of crime.